Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

نویسندگان

  • Juho Lee
  • Seungjin Choi
چکیده

Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., ...

متن کامل

Detailed Derivations of Small-Variance Asymptotics for some Hierarchical Bayesian Nonparametric Models

Numerous flexible Bayesian nonparametric models and associated inference algorithms have been developed in recent years for solving problems such as clustering and time series analysis. However, simpler approaches such as k-means remain extremely popular due to their simplicity and scalability to the large-data setting. The k-means optimization problem can be viewed as the small-variance limit ...

متن کامل

Small-Variance Asymptotics for Bayesian Nonparametric Models with Constraints

The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying seman...

متن کامل

Small Variance Asymptotics for Non-Parametric Online Robot Learning

Small variance asymptotics is emerging as a useful technique for inference in large scale Bayesian non-parametric mixture models. This paper analyses the online learning of robot manipulation tasks with Bayesian non-parametric mixture models under small variance asymptotics. The analysis yields a scalable online sequence clustering (SOSC) algorithm that is non-parametric in the number of cluste...

متن کامل

MAP for Exponential Family Dirichlet Process Mixture Models

The Dirichlet process mixture (DPM) is a ubiquitous, flexible Bayesian nonparametric model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibb’s sampling are required. As a result, DPM-based methods, which have considerable potential, are restricted to applications in which computational resources and time f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1501.07430  شماره 

صفحات  -

تاریخ انتشار 2015